Stochastic Neighbor Compression
نویسندگان
چکیده
We present Stochastic Neighbor Compression (SNC), an algorithm to compress a dataset for the purpose of k-nearest neighbor (kNN) classification. Given training data, SNC learns a much smaller synthetic data set, that minimizes the stochastic 1-nearest neighbor classification error on the training data. This approach has several appealing properties: due to its small size, the compressed set speeds up kNN testing drastically (up to several orders of magnitude, in our experiments); it makes the kNN classifier substantially more robust to label noise; on 4 of 7 data sets it yields lower test error than kNN on the entire training set, even at compression ratios as low as 2%; finally, the SNC compression leads to impressive speed ups over kNN even when kNN and SNC are both used with ball-tree data structures, hashing, and LMNN dimensionality reduction—demonstrating that it is complementary to existing state-of-the-art algorithms to speed up kNN classification and leads to substantial further improvements.
منابع مشابه
Neural Codes for Image Retrieval
This seminar report focuses on using convolutional neural networks for image retrieval. Firstly, we give a thorough discussion of several state-of-the-art techniques in image retrieval by considering the associated subproblems: image description, descriptor compression, nearest-neighbor search and query expansion. We discuss both the aggregation of local descriptors using clustering and metric ...
متن کاملAccelerating Fractal Image Compression by Multi-Dimensional Nearest Neighbor Search
In fractal image compression the encoding step is computationally expensive. A large number of sequential searches through a list of domains (portions of the image) are carried out while trying to find a best match for another image portion. Our theory developed here shows that this basic procedure of fractal image compression is equivalent to multi-dimensional nearest neighbor search. This res...
متن کاملStochastic neighbor embedding (SNE) for dimension reduction and visualization using arbitrary divergences
We present a systematic approach to the mathematical treatment of the t-distributed stochastic neighbor embedding (t-SNE) and the stochastic neighbor embedding (SNE) method. This allows an easy adaptation of the methods or exchange of their respective modules. In particular, the divergence which measures the difference between probability distributions in the original and the embedding space ca...
متن کاملAdaptive approximate nearest neighbor search for fractal image compression
Fractal image encoding is a computationally intensive method of compression due to its need to find the best match between image subblocks by repeatedly searching a large virtual codebook constructed from the image under compression. One of the most innovative and promising approaches to speed up the encoding is to convert the range-domain block matching problem to a nearest neighbor search pro...
متن کاملBreaking the Time Complexity of Fractal Image Compression
In fractal image compression the encoding step is computationally expensive. A large number of sequential searches through a list of domains (portions of the image) are carried out while trying to nd a best match for another image portion. We show that this step can be replaced by multi-dimensional nearest neighbor search which runs in logarithmic time instead of linear time required for the co...
متن کامل